Statistical language modeling using a variable context length

نویسنده

  • Reinhard Kneser
چکیده

In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not xed as in conventional M gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-o distribution can improve the language models. Experiments were performed on two data bases, the ARPANAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Multiple-Variable Matching to Identify EFL Ecological Sources of Differential Item Functioning

Context is a vague notion with numerous building blocks making language test scores inferences quite convoluted. This study has made use of a model of item responding that has striven to theorize the contextual infrastructure of differential item functioning (DIF) research and help specify the sources of DIF. Two steps were taken in this research: first, to identify DIF by gender grouping via l...

متن کامل

Strategic Competence and Foreign Language test Performance in Iranian Context

A number of studies have accounted the integral role of foreign/second language learning and learner strategy use. However, a few of these studies have considered the relationships between strategic competence and its use and foreign language performance (FLP). This study applied structural equation modeling to deeply investigate the relationships between test takers’ strategy use and their per...

متن کامل

A General MCMC Method for Bayesian Inference in Logic-Based Probabilistic Modeling

We propose a general MCMC method for Bayesian inference in logic-based probabilistic modeling. It covers a broad class of generative models including Bayesian networks and PCFGs. The idea is to generalize an MCMC method for PCFGs to the one for a Turing-complete probabilistic modeling language PRISM in the context of statistical abduction where parse trees are replaced with explanations. We des...

متن کامل

Strategic Competence and Foreign Language test Performance in Iranian Context

A number of studies have accounted the integral role of foreign/second language learning and learner strategy use. However, a few of these studies have considered the relationships between strategic competence and its use and foreign language performance (FLP). This study applied structural equation modeling to deeply investigate the relationships between test takers’ strategy use and their per...

متن کامل

Tone modeling using Gaussian process latent variable model for statistical speech synthesis

In continuous speech of Thai language, tone pronunciation is affected by several factors. One of significant factors is stress that causes a diversity of F0 contours of tone, and affects syllable durations. Our previous studies have shown that a stressed/unstressed syllable context improves tone modeling accuracy. However, the stress in Thai language is generally unknown for a given input text ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996